AClAP, Autonomous hierarchical agglomerative Cluster Analysis based protocol to partition conformational datasets
نویسندگان
چکیده
MOTIVATION Sampling the conformational space is a fundamental step for both ligand- and structure-based drug design. However, the rational organization of different molecular conformations still remains a challenge. In fact, for drug design applications, the sampling process provides a redundant conformation set whose thorough analysis can be intensive, or even prohibitive. We propose a statistical approach based on cluster analysis aimed at rationalizing the output of methods such as Monte Carlo, genetic, and reconstruction algorithms. Although some software already implements clustering procedures, at present, a universally accepted protocol is still missing. RESULTS We integrated hierarchical agglomerative cluster analysis with a clusterability assessment method and a user independent cutting rule, to form a global protocol that we implemented in a MATLAB metalanguage program (AClAP). We tested it on the conformational space of a quite diverse set of drugs generated via Metropolis Monte Carlo simulation, and on the poses we obtained by reiterated docking runs performed by four widespread programs. In our tests, AClAP proved to remarkably reduce the dimensionality of the original datasets at a negligible computational cost. Moreover, when applied to the outcomes of many docking programs together, it was able to point to the crystallographic pose. AVAILABILITY AClAP is available at the "AClAP" section of the website http://www.scfarm.unibo.it.
منابع مشابه
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N2) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered se...
متن کاملWei‐Bang Chen and Chengcui Zhang – 2 Revision Title: A Hybrid Framework for Protein Sequences Clustering and Classification Using Signature Motif Information
In this paper, we propose an unsupervised hybrid framework for protein sequence clustering and classification which incorporates protein structural motif information. The proposed framework consists of three stages: protein structural motif scan, hybrid clustering, and sequence classification. The incorporation of protein structural motif detected by ScanProsite service provides a better measur...
متن کاملA Model Based Metaheuristic for Hybrid Hierarchical Community Structure in Social Networks
In recent years, the study of community detection in social networks has received great attention. The hierarchical structure of the network leads to the emergence of the convergence to a locally optimal community structure. In this paper, we aim to avoid this local optimum in the introduced hybrid hierarchical method. To achieve this purpose, we present an objective function where we incorpora...
متن کاملRecitation 6: Random forests and affinity propagation
– Partitioning clustering algorithms, construct non-overlapping clusters such that each item is assigned to exactly one cluster. Example: k-means – Agglomerative clustering algorithms construct a hierarchical set of nested clusters, indicating the relatedness between clusters. Example: hierarchical clustering • In classification, we partition data into known labels. For example, we might constr...
متن کاملHigh-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 22 14 شماره
صفحات -
تاریخ انتشار 2006